A Case for Software Managed Coherence in Many-core Processors
نویسندگان
چکیده
Processor vendors are integrating more and more cores into their chip. These many-core processors usually implement hardware coherence mechanisms, but when the core count goes to hundreds or more, it becomes prohibitively difficult to design and verify efficient hardware coherence support. Despite this, many parallel applications, for example RMS applications [9], show little data sharing, which suggests that providing a complex hardware coherence implementation wastes hardware budget and design effort. Moreover, in some increasingly important domains, such as server and cloud computing, multiple applications may run on a single many-core chip. Those applications require coherence support among the cores that they are running on, but not between different applications. This indicates a strong requirement for dynamically reconfigurable coherence domains, which is extremely hard to support with hardware-only mechanisms. In addition, hardware coherence is believed to be too complex to support heterogeneous platforms such as combined CPU-GPU systems, regardless whether the GPU is integrated or discrete. In this paper, we argue that software managed coherence is a better choice than hardware coherence for many-core processors. We believe that software managed coherence can make better use of silicon, efficiently support emerging applications, dynamically reconfigure coherence domain, and most importantly, still be able to provide performance comparable to hardware coherence. We implemented a prototype system with software managed coherence over a partially-shared address space and show promising results.
منابع مشابه
Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملDesigning and Dismounting an Intelligent System of Irrigation Management for Greenhouse based on Delphi Software
The drought continuity and also restricting watery sources caused agriculture section forgetold flooding methods for optimum water exploitation and proceeding new irrigation systems.New generation of irrigation systems called intelligent systems is a new solution leading toexploiting water increase to higher than 80%. In order to measure sensors and to controlprocessors in designing and dismoun...
متن کاملAutomatic management of Software Programmable Memories in Many-core Architectures
Software Programmable Memories, or SPMs, are raw on-chip memories that are not implicitly managed by the processor hardware, but explicitly by software. For example, while caches fetch data from memories automatically and maintain coherence with other caches, SPMs explicitly manage data movement between memories and other SPMs through software instructions. SPMs make the design of on-chip memor...
متن کاملCross-Layer Customization for low Power and High Performance Embedded Multi-Core Processors
Title of dissertation: CROSS-LAYER CUSTOMIZATION FOR LOW POWER AND HIGH PERFORMANCE EMBEDDED MULTI-CORE PROCESSORS Chenjie Yu, Doctor of Philosophy, 2010 Dissertation directed by: Assistant Professor Peter Petrov Electrical and Computer Engineering Due to physical limitations and design difficulties, computer processor architecture has shifted to multi-core and even many-core based approaches i...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کامل